Search CORE

204 research outputs found

Comparative genomics reveals birth and death of fragile regions in mammalian evolution

Author: Alekseyev Max A
Pevzner Pavel A
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

PubMed Central

Are There Rearrangement Hotspots in the Human Genome?

Author: Alekseyev Max A.
Pevzner Pavel A.
Publication venue: Scholar Commons
Publication date: 01/01/2007
Field of study

In a landmark paper, Nadeau and Taylor [18] formulated the random breakage model (RBM) of chromosome evolution that postulates that there are no rearrangement hotspots in the human genome. In the next two decades, numerous studies with progressively increasing levels of resolution made RBM the de facto theory of chromosome evolution. Despite the fact that RBM had prophetic prediction power, it was recently refuted by Pevzner and Tesler [4], who introduced the fragile breakage model (FBM), postulating that the human genome is a mosaic of solid regions (with low propensity for rearrangements) and fragile regions (rearrangement hotspots). However, the rebuttal of RBM caused a controversy and led to a split among researchers studying genome evolution. In particular, it remains unclear whether some complex rearrangements (e.g., transpositions) can create an appearance of rearrangement hotspots. We contribute to the ongoing debate by analyzing multi-break rearrangements that break a genome into multiple fragments and further glue them together in a new order. In particular, we demonstrate that (1) even if transpositions were a dominant force in mammalian evolution, the arguments in favor of FBM still stand, and (2) the ‘‘gene deletion’’ argument against FBM is flawed

CiteSeerX

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina

PubMed Central

Sum-of-squares lower bounds for planted clique

Author: Alan
Arora Sanjeev
Berthet Q.
Karp R. M.
Pevzner Pavel A
Publication venue
Publication date: 22/03/2015
Field of study

Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size k is planted in a random G(n, 1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for k ~ sqrt(n). In this paper we study the complexity of the planted clique problem under algorithms from the Sum-of-squares hierarchy. We prove the first average case lower bound for this model: for almost all graphs in G(n,1/2), r rounds of the SOS hierarchy cannot find a planted k-clique unless k > n^{1/2r} (up to logarithmic factors). Thus, for any constant number of rounds planted cliques of size n^{o(1)} cannot be found by this powerful class of algorithms. This is shown via an integrability gap for the natural formulation of maximum clique problem on random graphs for SOS and Lasserre hierarchies, which in turn follow from degree lower bounds for the Positivestellensatz proof system. We follow the usual recipe for such proofs. First, we introduce a natural "dual certificate" (also known as a "vector-solution" or "pseudo-expectation") for the given system of polynomial equations representing the problem for every fixed input graph. Then we show that the matrix associated with this dual certificate is PSD (positive semi-definite) with high probability over the choice of the input graph.This requires the use of certain tools. One is the theory of association schemes, and in particular the eigenspaces and eigenvalues of the Johnson scheme. Another is a combinatorial method we develop to compute (via traces) norm bounds for certain random matrices whose entries are highly dependent; we hope this method will be useful elsewhere

arXiv.org e-Print Archive

Crossref

What is the difference between the breakpoint graph and the de Bruijn graph?

Author: Lin Yu
Nurk Sergey
Pevzner Pavel A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

The breakpoint graph and the de Bruijn graph are two key data structures in the studies of genome rearrangements and genome assembly. However, the classical breakpoint graphs are defined on two genomes (represented as sequences of synteny blocks), while the classical de Bruijn graphs are defined on a single genome (represented as DNA strings). Thus, the connection between these two graph models is not explicit. We generalize the notions of both the breakpoint graph and the de Bruijn graph, and make it transparent that the breakpoint graph and the de Bruijn graph are mathematically equivalent. The explicit description of the connection between these important data structures provides a bridge between two previously separated bioinformatics communities studying genome rearrangements and genome assembly

The Australian National University

SpectroGene: A Tool for Proteogenomic Annotations Using Top-Down Spectra

Author: Kolmogorov Mikhail
Liu Xiaowen
Pevzner Pavel A.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 04/01/2016
Field of study

In the past decade, proteogenomics has emerged as a valuable technique that contributes to the state-of-the-art in genome annotation; however, previous proteogenomic studies were limited to bottom-up mass spectrometry and did not take advantage of top-down approaches. We show that top-down proteogenomics allows one to address the problems that remained beyond the reach of traditional bottom-up proteogenomics. In particular, we show that top-down proteogenomics leads to the discovery of previously unannotated genes even in extensively studied bacterial genomes and present SpectroGene, a software tool for genome annotation using top-down tandem mass spectra. We further show that top-down proteogenomics searches (against the six-frame translation of a genome) identify nearly all proteoforms found in traditional top-down proteomics searches (against the annotated proteome). SpectroGene is freely available at http://github.com/fenderglass/SpectroGene

IUPUIScholarWorks

The Fragile Breakage versus Random Breakage Models of Chromosome Evolution

Author: Peng Qian
Pevzner Pavel A
Tesler Glenn
Publication venue: Public Library of Science
Publication date: 01/02/2006
Field of study

For many years, studies of chromosome evolution were dominated by the random breakage theory, which implies that there are no rearrangement hot spots in the human genome. In 2003, Pevzner and Tesler argued against the random breakage model and proposed an alternative “fragile breakage” model of chromosome evolution. In 2004, Sankoff and Trinh argued against the fragile breakage model and raised doubts that Pevzner and Tesler provided any evidence of rearrangement hot spots. We investigate whether Sankoff and Trinh indeed revealed a flaw in the arguments of Pevzner and Tesler. We show that Sankoff and Trinh's synteny block identification algorithm makes erroneous identifications even in small toy examples and that their parameters do not reflect the realities of the comparative genomic architecture of human and mouse. We further argue that if Sankoff and Trinh had fixed these problems, their arguments in support of the random breakage model would disappear. Finally, we study the link between rearrangements and regulatory regions and argue that long regulatory regions and inhomogeneity of gene distribution in mammalian genomes may be responsible for the breakpoint reuse phenomenon

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

De novo Inference of Diversity Genes and Analysis of Non-canonical V(DD)J Recombination in Immunoglobulins

Author: Pavel A. Pevzner
Yana Safonova
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2019
Field of study

The V(D)J recombination forms the immunoglobulin genes by joining the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics aims at finding alleles of germline genes across various patients. Although recent studies described algorithms for de novo inference of V and J genes from immunosequencing data, they stopped short of solving a more difficult problem of reconstructing D genes that form the highly divergent CDR3 regions and provide the most important contribution to the antigen binding. We present the IgScout algorithm for de novo D gene reconstruction and apply it to reveal new alleles of human D genes and previously unknown D genes in camel, an important model organism in immunology. We further analyze non-canonical V(DD)J recombination that results in unusually long CDR3s with tandem fused IGHD genes and thus expands the diversity of the antibody repertoires. We demonstrate that tandem CDR3s represent a consistent and functional feature of all analyzed immunosequencing datasets, reveal ultra-long CDR3s, and shed light on the mechanism responsible for their formation

Directory of Open Access Journals

FigShare

Recommended from our members

Identifying Repeat Domains in Large Genomes

Author: Pevzner Pavel A
Price Alkes
Raphael Benjamin J
Tang Haixu
Zhi Degui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2010
Field of study

We present a graph-based method for the analysis of repeat families in a repeat library. We build a repeat domain graph that decomposes a repeat library into repeat domains, short subsequences shared by multiple repeat families, and reveals the mosaic structure of repeat families. Our method recovers documented mosaic repeat structures and suggests additional putative ones. Our method is useful for elucidating the evolutionary history of repeats and annotating de novo generated repeat libraries

Harvard University - DASH